Inforex - a collaborative system for text corpora annotation and analysis
نویسندگان
چکیده
We report a first major upgrade of Inforex — a web-based system for qualitative and collaborative text corpora annotation and analysis. Inforex is a part of Polish CLARIN infrastructure1. It is integrated with a digital repository for storing and publishing language resources2 and it allows to visualize, browse and annotate text corpora stored in the repository. As a result of a series of workshops for researchers in Humanities and Social Sciences we improved the graphical interface to make the system more friendly and readable for non-experienced users. We also implemented a new functionality for a gold standard annotation which includes private annotations and annotation agreement by a super-annotator.
منابع مشابه
Inforex - a web-based tool for text corpus management and semantic annotation
The aim of this paper is to present a system for semantic text annotation called Inforex. Inforex is a web-based system designed for managing and annotating text corpora on the semantic level including annotation of Named Entities (NE), anaphora, Word Sense Disambiguation (WSD) and relations between named entities. The system also supports manual text clean-up and automatic text pre-processing ...
متن کاملPhrase Detectives: A Web-based Collaborative Annotation Game
Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however requires developing methods for teaching subjects the rules of the game and evaluating their contribution whil...
متن کاملPACTE : a collaborative platform for textual annotation
In this article, we provide an overview of a web-based text annotation platform, called PACTE. We highlight the various features contributing to making PACTE an ideal platform for research projects involving textual annotation of large corpora performed by geographically distributed teams.
متن کاملاستخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملCollecting Life Logs for Experience-Based Corpora
In this paper we propose an approach to lightweight acquisition, sharing and annotation of experience-based corpora via mobile devices. Corpora acquisition is the crucial and often costly process in speech and language science and engineering. To address this problem, we have built a system for creating a location based corpora annotated with multimedia tags (e.g. text, speech, image) generated...
متن کامل